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Abstract 

We consider the well known Least Recently Used (LRU) replacement algorithm and analyze 
it under the independent reference model and generalized power-law demand. For this extensive 
family of demand distributions we derive a closed-form expression for the per object steady-state 
hit ratio. To the best of our knowledge, this is the first analytic derivation of the per object 
hit ratio of LRU that can be obtained in constant time without requiring laborious numeric 
computations or simulation. Since most applications of replacement algorithms include (at 
least) some scenarios under i.i.d. requests, our method has substantial practical value, especially 
when having to analyze multiple caches, where existing numeric methods and simulation become 
too time consuming. 

1 Introduction 

Although very simple in both conception and implementation, the LRU replacement algorithm is 
notoriously hard in terms of analysis. Attempts to obtain the per object steady-state hit ratio in 
an LRU operated cache under the independent reference model (IRM) p] date back to the early 
70's and have continued appearing in the literature until very recently [21 El H] • As elaborated 
later on in this article, such attempts yield either (1) intractable numeric methods for obtaining 
the exact hit probabilities [SJ [T] [B] , (2) tractable numeric methods for obtaining approximate hit 
probabilities [7[ [H dj [31 E] , or (3) asymptotic results under infinite number of objects and infinite 
storage capacity [TO] . In this article we derive for the first time a closed- form formula that can be 
used for obtaining approximate hit probabilities in constant time, i.e., without numeric computation 
that depends on input parameters like the number of objects and the storage capacity. Although 
previous approximate numeric methods are fast (linear complexity), being able to compute the hit 
probabilities in constant time gives a significant advantage, especially when the object universe is 
large or when there are more than one caches to be analyzed. Examples include networks of inter- 
connected cooperative caches [TTJ [T2j [13] , peer-to-peer caching systems [14| , semantic caching and 
query processing [T5] , 
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We achieve the aforementioned result for generalized power-law demand distributions [161 117] . 
Our interest on this family is based on the fact that such popularity profiles have been observed 
in many real- world measurement studies related to replacement algorithms, including [18U19j . It is 
also quite a versatile family as it includes a wide range of profiles, from uniform (having skewness 
parameter a = 0) to Zipf (having skewness parameter a = 1). Most new applications that include a 
cache0 that operates under LRU replacement, typically include among others, experimental results 
under power-law popularity; in these cases our closed-form method can be used instead of laborious 
numeric methods or simulation. 



2 Related Work 

The problem of analyzing the hit ratio of LRU can be traced back to the 70's. King [5] was the first 
to derive the steady-state behavior of LRU under IRM. Initial attempts employed a Markov chain 
to model the contents of a cache operating under LRU. Unfortunately, such attempts give rise to 
huge Markov chains, having C\(%) states (where N denotes the total number of distinct objects, 
and C denotes the capacity of the cache in unit-sized objects); numerical results for such chains 
can only be derived for very small N and C. More efficient steady-state formulas have been derived 
by avoiding the use of Markov chains, and instead making combinatorial arguments; see Koffman 
and Denning pQ, and Starobinski and Tse [6]. However, such approaches still incur a computational 
complexity that is exponential in N and C. Flajolet et al. [7j have presented integral expressions 
for the hit ratio, which can be approximated using numerical integration at complexity O(NC). 
Dan and Towsley [5] have derived an O(NC) iterative method for the approximation of the hit 
ratio. Jalenkovic [9] has provided a closed form expression for the particular case of generalized 
power-law demand with skewness parameter a > 1, for the asymptotic case, N, C — > oo. The same 
author has shown that the hit ratio of LRU under such demand is asymptotically insensitive for 
large caches, i.e., C — > oo, to temporal correlations of the request arrival process [TO]. The most 
recent attempts on the analysis of LRU can be found in [31 [3J 2] . These works build on the notion 
of characteristic time, which is also used in our work. More details on these works and the concept 
of the characteristic time are given in the following sections. 



3 Background and Scope 

Consider an object set O — {o\, . . . , on}, where Oi denotes the ith unit-sized object. Assume that 
requests are issued for the objects of O and that successive requests are independent and identically 

1 We would like at this point to emphasize the distinction between caching systems and replacement algorithms. A 
caching system involves many more design choices other than the particular replacement algorithm (there are issues 
of associativity, multi- level hierarchical structure, and others). The current article is about analyzing a particular 
replacement algorithm and does not make any claims about the more general problem of designing cache memories. 

2 The independent reference model [TJ is commonly used to characterize cache access patterns |201|18j . The impact 
of temporal correlations was shown in 1211 1221 to be minuscule, especially under typical, Zipf-likc object popularity 
profiles. These works showed that temporal correlations decrease rapidly with the distance between any two samples 
so, as long as the cache size is not minuscule, they do not impact fundamentally on the i.i.d. assumption. The 
unit assumption regarding the size of objects is a standard one in all previous works [T|- |10l and stems from the 
desire to avoid adding 0/1-knapsack type complexities to a problem that is already combinatorial. Practically, it 
is justified on the basis that in many caching systems the objects are much smaller than the available cache size. 
Similarly, all previous works assume stationarity of demand over some time horizon. This is supported by many of 
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distributed according to a common probability distribution p = {pi, . . . ,pn}, where pi denotes the 
request probability for the ith most popular object of O (hereafter assumed to be object Oi without 
loss of generality) . The aggregate stream of requests is assumed to be arriving to a cache according 
to a Poisson arrival procesio of rate A requests/unit of time (meaning that the stream of request for 
any given object is also Poisson with rate A ■ pt, 1 < i < N). In Laoutaris et al. [3] we showed that 
under the above mentioned request model, an LRU operated cache with capacity for C unit-sized 
objects reaches a steady-state in which the probability of finding object Oj in the cache is given by: 

Trj = 1 - (1) 

In the above equation, denotes the maximum inter-arrival time between two adjacent request for 
object Oi, both of which lead to hits. This quantity is referred to as the characteristic time of object 
Oi and is due to Che et al. [2]. In essence, ri is a random variable, but it can be approximated by 
a constant in order to carry-out a tractable analysis. This is characterized as a mean field approxi- 
mation in [5] and the rationale behind it is that when the object set is large enough, fluctuates 
closely around its mean value, so it can be effectively approximated by it. The characteristic time 
ri of object Oj, 1 < i < N, was obtained in [2j[3] by solving the following equation numerically: 

N N 

1 - e"^ r< = C => e ~ PiU =N-1-C (2) 

i=i 3=1 

This equation gives the time interval that is required for the N — 1 other objects to generate C 
distinct request^] and thus evict Oj, granted that o; is not re-requested in this interval. However, 
solving N such equations, one for each object, is cumbersome, especially for large N. This can be 
partially alleviated by considering a single characteristic time r for all the objects and thus solving 
only one equation. Such an approximation is justifiable on the basis that the characteristic times ri 
of different objects do not differ substantially, even under skewed popularity distributions. Figure[T] 
supports this claim by illustrating the characteristic times of objects in an LRU cache with capacity 
for C = 100 objects that is driven by requests over an object universe of N = 1000 objects, whose 
popularities follow a generalized power-law with skewness a ~ 0.8 (the request rate for this and 
all subsequent examples is normalized to A = 1 request /unit of time) . The characteristic times are 
obtained by solving Eq. (JJ) numerically. One can observe that although p is skewed, the difference 
between the characteristic times of different objects is very small (thus ri/riooo = 1-011 despite 
that pi/piooo = 251, i.e., two orders of magnitude apart). The plot essentially says that request 
inter-arrivals for the same object that are longer than 134-135 time units lead to misses. 

The approach of using a common characteristic time r for all objects was recently employed by 
Panagakis et al. in 4 . In the same work it was observed that the most natural way of finding 
the common characteristic time is by solving the following normalization equation which simply 



the aforementioned measurement works, over multiple time scales. Obviously, if the demand is non stationary and 
radically changing over small time scales, no analysis can be carried out. 

3 We can alternatively obtain similar results by assuming a Bernoulli arrival process and carrying-out a discrete 
time analysis. We choose to remain on the continuous time domain so as to be aligned with the preceding body of 
work in [2] |3] S]. 

4 Observe that the quantity within the summation is the CDF of the exponential request inter-arrival time for 
object Oj calculated at point ri. 
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Figure 1: An example with the characteristic times calculated by Eq. |(2]l 



requires that all the steady-state object hit probabilities sum up to the capacity of the cache, i.e.: 

N N N 

Y 7T, = c =► y, 1 - e ~ nr = C ^Y1 e ~ p,r = N -° ( 3 ) 

i— 1 i—l i— 1 

The above equation was solved numerically in [4], similarly to the case of [2 [3] and Eq. (2}. In the 
following section, we utilize the notion of characteristic time as developed in [21 El H] and present 
an analysis that leads to the derivation of a closed-form formula for the behavior of LRU caching. 
This is, to the best of our knowledge, the first, non-asymptotic, closed-form approximate formula 
for LRU (the closed-form expression of Jalenkovic in [S] covers only the asymptotic case N, C — * oo 
and is only for a > 1). Our method can be used for the study of LRU caching, whether in stand- 
alone mode (a single LRU cache), or, more interestingly, in hierarchical [H [3] or distributed [13] 
inter-connections of caches, without requiring laborious numeric computations. 



4 Analysis of LRU under Generalized Power-Law Demand 

We assume that p follows a generalized power-law distribution, in which the ith most popular object 
has request probability pi — A/i a , where A = (X^i'=i l^-)" 1 ^ s a normalization constant, and a is 
a skewness parameter. Under such demand, we show how to obtain an approximate closed-form 
formula for the common characteristic time r of Eq. ([3]) . This gives directly a closed- form expression 
for the hit ratio of each object through Eq. {T]). Our analysis can be easily adapted to handling per 
object characteristic times rt. The only difference in this case would be that we would start from 
Eq. © instead of Eq. ©. 

First we take the Taylor series expansion of the exponential form e~ Pil ~ in terms of the variable 
r around point C: 

e - Pi r = e - Pi C . g {-Pi ■ Or - C)f (4) 

fe=0 
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The exponential form e PiC of Eq. © can be similarly expanded in terms of the variable pi around 
point as follows: 
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Using Eqs ©, © in Eq. © we can write: 
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Denoting = (—C) k /k\ and = (— (r — C)) /fc!, and limiting fc to < /c < AT instead of letting 
it run to oo, we can approximate Eq. ([6]) as follows: 
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As will be shown later through numeric examples, the truncation to K has a small effect on the 
accuracy as compared to solving Eq. ([7]) for K — » oo. This owes to the fact that the remainder for 
fc > K of the previous exponential forms ([1]), ([5]) can be bounded by 0(1/ Kl). 

We continue the analysis by putting into use our assumption that pi follows a power-law distri- 
bution, and so we can write: 



E^ = E(£) = Am -E^ = Am -<"\ 
i=i .=i v * / i=i * 



(8) 



(a) 



x — J^iLi l/^ a denotes the ATth Harmonic number of order a. HYf' can be approximated 
n1 ~"~ 1 (see also [23]). Substituting from Eq. © into Eq. we 



where ii 

by its integral expression H^f 1 
obtain our master equation: 
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= N -C 



(9) 



The master equation is a if -order polynomial equation of r (corresponding to an approximate version 
of Eq. © that retains only K + 1 first terms from the Taylor series expansions of the exponential 
forms of Eqs ©, ©). One can solve the master equation in arbitrary accuracy by increasing 
K . This, of course, presumes a numerical solution and, thus, does not differ fundamentally from 
the previous numerical approaches in [HESS]. Where the master equation is essentially different, 
is in that it has a form that can be utilized for setting up a closed-form solution. This can be 
accomplished by selecting appropriately small K that give rise to such results. Such flexibility is 
not provided by Eqs. ©, ©. 
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Consider the case of K = 2. Substituting a m , b m and doing some algebraic manipulation reduces 
the master equation into the following quadratic equation (K — 2 amounts to retaining the first 
three terms of the Taylor series expansions of Eqs JU), ©): 

2 , , n V, A2 rr(2a) ( 3a ) A 4 C 2 ( 4a) 

a 2 r z + air + a = where: a 2 = -y#iv 2~ ^ _ 4~~ ^ 



ax = - Ai#> + ^2$*° " (10) 



a = C H — H N 



A ^(40 

4 



The characteristic time can then be taken by selecting an appropriate real solution (assuming that 
one exists, more on this in the sequel) from the quadratic formula: r = "'^^ 4tt2 ° . 

We can go a step further and consider the case of K = 3 which yields the following cubic 
equation: 

3 . 2 . n v A3 o-(3a) . A 4 C (4a ) A 5 C 2 (5a) A 6 C* 3 (6a) 

as?" + aar- + ot\r + a = where: a 3 = — ^ff^ + —z~H^ ttt h n + ~^^ h n 



6 JV 6 iV 12 JV 36 

A 2 (3.) A 4 C 2 (4a) A 5 C 3 (5a) A 6 ^ 4 (6a) 

T w +— ff iv -^2~ ff JV 



"1 - -A-Hjv H — iijv JV + — [2~ H JV 



a -C-—H N ~-^-H N 

(11) 

The cubic formula [24] (we do not repeat it here due to space considerations) returns the three 
solutions to the above cubic equation expressed as analytic functions of the coefficients 0:3, a%, ol\, ag 
(which, in turn, are analytic functions|£| of the input parameters C, N, a) ; at least one the three 
solutions is always guaranteed to be in the domain of real numbers (such a guarantee does not exist 
for the quadratic equation, for which, both solutions can be complex). Due to this guarantee, and 
also to the fact that it provides a closer approximation by considering an additional term from the 
Taylor expansion, we focus on the K = 3 case]*] Let ta, Tbi rr be the three roots of Eq. (fTTj) returned 
by the cubic formula. We select as characteristic time the smallest real solution rx, X £ {A, B,T} 
that exceeds C, i.e.: 

r = min (r x ) :r x £R,r x > C (12) 

X£{A,BT} 

The rationale behind this choice is that it takes at least C requests to evict a newly inserted object 
so the characteristic time has to be larger than C (the characteristic time is in units of time or 
alternatively in number of requests, since we have normalized the request rate A into 1 req./time 
slot). In the next section we show that the above approximation yields accurate r and tti across a 
wide range of parameters C, N,a. 



5 For the generalized Harmonic number we use its integral approximation as stated earlier on. 

6 Theoretically we could go even further and consider the quartic equation (K = 4). This, however, involves 
very cumbersome formulas for the roots and is marginally valuable since the cubic equation already provides close 
approximation as will be demonstrated in Sect. [5] The quintic and all higher order equations (K > 5) do not posses 
a general solution over the rationals in terms of radicals (the "Abel-Ruffini" theorem). 
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Table 1 : Evaluation of the accuracy of our approximate closed-form formula for the characteristic time on a set of 
N = 1000 objects, under varying cache size C and demand skewness a. The top value of each cell gives the exact 
characteristic time from solving Eq. J3} numerically while the bottom value gives the approximate characteristic time 
from Eq. 1(12)) . 

5 Numeric Results 

In this section we first compare the accuracy of the approximate characteristic time that we obtain 
from Eq. I|12[) with the exact characteristic time that we obtain from solving Eq. ([3]) numerically. 
Table [T] provides such a comparison drawn from a universe of N = 1000 objects and for varying a 
and C . Each cell of the table corresponds to an (a, C) pair and contains two numeric values: the top 
one is the exact characteristic time while the bottom one is the approximate one that we compute 
through our method. These values correspond to units of time, or equivalently, number of requests. 

One may observe that our approximation tracks closely the actual characteristic time. Deviations 
appear only under very skewed demand (e.g., a > 0.8) and large relative storage capacities (e.g., 
C/N > 20%). These cases, however, are neither typical, nor really interesting, for the following 
reasons. First, cache memories rarely operate under so much storage. Typical values for the ratio 
C/N are well below 10% in most applications (this is after all the main reason for employing caches 
- lack of memory space for all the objects). Second, a high availability of storage, combined with a 
high skewness, leads to a fairly expected cache hit ratio that approaches 1 and, thus, there is not 
much practical purpose for studying such a case analytically. We note, however, that our method 
can be twicked in order to provide useful results for these cases also. We show how to do this later 
in this section. 

The next set of results compares the analytic per object steady-state hit probabilities obtained 
by plugging the characteristic time r of Eq. (fT2|) into Eq. fl} , with corresponding hit probabilities 
obtained by simulating LRU for 10 million requests. The three graphs of Fig. [2] correspond to 
skewness a — 0.4, 0.6 and 0.8. Each graph includes 8 curves corresponding to results obtained 
from simulation and analysis under different ratios C/N = 5%, 10%, 15% and 20%. One may 
observe that for low (a = 0.4) and medium (a = 0.6) skewness, the analytically computed hit ratios 
match almost perfectly with the simulated ones, across all storage availabilities. For high skewness 
(a = 0.8), our results are very accurate up to a storage availability of 10% and then start to deviate 
(some deviation for C/N = 15% and a larger one for C/N — 20%). In other words, the method 
becomes less accurate under very skewed demand and large availability of storage. The reason for 
this deviation is that under such settings, the omission of higher order terms of the Taylor series 
expansion of the previously mentioned exponential forms, disrupts significantly the balance of the 
(normalization) Eq. ([3]), thus leading to 7Tj's that do not sum up to C. As we commented earlier, a 
storage availability higher than 10% is not realistic under most caching applications. Nevertheless, 
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Algorithm 1 ProportionalNormalization(7r: 1 x JV vector, N, C scalars) 

1: for i = 1 to N do 

2: mjmass = C — 5ZjLi "j! 
3: 5 = mjmass ■ 7Tj/ 5Zj=i "ji 
4: 7r; = min{7Ti + <5, 1}; 



in the following paragraph we will describe proportional normalization, a method for fixing this 
problem by reshaping the n^s and, actually, achieving a high accuracy even under high storage 
availability and skewed demand. 

Proportional normalization: In this section we describe a simple normalization method for 
fixing the missing probability mass problem that occurs under combined high C/N and a. This 
is achieved through a proportional normalization method that distributes the missing probability 
mass among the different objects in such a way that each object's hit probability is incremented 
proportionally to its hit probability as derived by our base-line closed-form method. In Algorithm!]] 
we describe the proportional normalization method. The algorithm takes as input the vector of hit 
probabilities derived from Eq. fl} after plugging in the analytically computer characteristic time r 
and returns a normalized vector of hit probabilities that sum up to C. In Fig. [3] we compare the 
normalized hit probabilities with the corresponding ones from simulation under a storage availability 
C/N — 20% (under such availability, and for high skewness, there was a substantial disagreement 
between simulation and analysis, as shown in the third graph of Fig. [2]). From Fig. [3] it is clear that 
after the normalization there is almost perfect agreement between the simulation and the analytic 
results. Thus by combining our analytic method with proportional normalization, one can obtain 
accurate hit ratios even under combined high storage availability and skewed demand. 

6 Conclusions 

In this work we have presented a closed-form approximate method for obtaining the per object hit 
ratio under LRU replacement and independent generalized power-law requests. Our method obtains 
accurate results for a wide range of parameters. It becomes less accurate only when combining a 
very high storage availability (which is not typical under most caching applications) with skewed 
demand. To accommodate this case, we describe a simple proportional normalization procedure 
that, when combined with our baseline closed- form method, corrects its accuracy. To the best of 
our knowledge, our method is the first one to produce non asymptotic closed-form results for LRU. 
Due to the complete lack of any kind of numeric computation our method can be used for the 
analysis of large networks of LRU caches in which existing numeric methods and simulation become 
impractical from a computational point of view. 
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Figure 2: Comparison of simulated and analytic per object hit probabilities (7Tj's) on a universe of N = 1000 
objects for different storage capacities (C = 50, 100, 150, 200) and skewness parameters (a = 0.4, 0.6, 0.8) for the 
input generalized power-law demand. A word of caution: in the third graph (a = 0.8) the analytic line for C = 200 
overlaps coincidentally with the simulation line for C = l|5p. 
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Figure 3: Comparison of simulated and proportionally normalized analytic per object hit probabilities (ni's) on a 
universe of TV = 1000 objects for a storage capacity C = 200 and different skewness parameters (a = 0.4,0.6,0.8) for 
the input generalized power-law demand. 
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' Abstract 

We consider the well known Least Recently Used (LRU) replacement algorithm and 
analyze it under the independent reference model and generalized power-law demand. For 
this extensive family of demand distributions we derive a closed-form expression for the 
| per object steady-state hit ratio. To the best of our knowledge, this is the first analytic 



derivation of the per object hit ratio of LRU that can be obtained in constant time with- 
out requiring laborious numeric computations or simulation. Since most applications of 
replacement algorithms include (at least) some scenarios under i.i.d. requests, our method 
has substantial practical value, especially when having to analyze multiple caches, where 
existing numeric methods and simulation become too time consuming. 

X 
& ' 

1 Introduction 



Although very simple in both conception and implementation, the LRU replacement algorithm 

is notoriously hard in terms of analysis. Attempts to obtain the per object steady-state hit 

ratio in an LRU operated cache under the independent reference model (IRM) [?] date back to 

the early 70's and have continued appearing in the literature until very recently [?, ?, ?]. As 

elaborated later on in this article, such attempts yield either (1) intractable numeric methods 

for obtaining the exact hit probabilities [?, ?, ?], (2) tractable numeric methods for obtaining 

approximate hit probabilities [?, ?, ?, ?, ?], or (3) asymptotic results under infinite number 

'Work conducted at Boston University with support from a Marie Curie Outgoing International Fellowship 
of the EU (MOIF-CT-2005-007230). 
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of objects and infinite storage capacity [?, ?]. In this article we derive for the first time a 
closed-form formula that can be used for obtaining approximate hit probabilities in constant 
time, i.e., without numeric computation that depends on input parameters like the number 
of objects and the storage capacity. Although previous approximate numeric methods are 
fast (linear complexity), being able to compute the hit probabilities in constant time gives 
a significant advantage, especially when the object universe is large or when there are more 
than one caches to be analyzed. Examples include networks of inter-connected cooperative 
caches [?, ?, ?], peer-to-peer caching systems [?], semantic caching and query processing [?]. 

We achieve the aforementioned result for generalized power-law demand distributions [?, ?]. 
Our interest on this family is based on the fact that such popularity profiles have been observed 
in many real- world measurement studies related to replacement algorithms, including [?, ?]. 
It is also quite a versatile family as it includes a wide range of profiles, from uniform (having 
skewness parameter a = 0) to Zipf (having skewness parameter a = 1). Most new applications 
that include a cachqj that operates under LRU replacement, typically include among others, 
experimental results under power-law popularity; in these cases our closed-form method can 
be used instead of laborious numeric methods or simulation. 



2 Related Work 

The problem of analyzing the hit ratio of LRU can be traced back to the 70's. King [?] was 

the first to derive the steady-state behavior of LRU under IRM. Initial attempts employed 

a Markov chain to model the contents of a cache operating under LRU. Unfortunately, such 

attempts give rise to huge Markov chains, having C!(^) states (where N denotes the total 

number of distinct objects, and C denotes the capacity of the cache in unit-sized objects); 

numerical results for such chains can only be derived for very small N and C. More efficient 

steady-state formulas have been derived by avoiding the use of Markov chains, and instead 

making combinatorial arguments; see Koffman and Denning [?], and Starobinski and Tse [?]. 

However, such approaches still incur a computational complexity that is exponential in N 

and C. Flajolet et al. [?] have presented integral expressions for the hit ratio, which can be 

1 We would like at this point to emphasize the distinction between caching systems and replacement algo- 
rithms. A caching system involves many more design choices other than the particular replacement algorithm 
(there are issues of associativity, multi- level hierarchical structure, and others). The current article is about 
analyzing a particular replacement algorithm and does not make any claims about the more general problem 
of designing cache memories. 



2 



approximated using numerical integration at complexity 0{NC). Dan and Towsley [?] have 
derived an 0{NC) iterative method for the approximation of the hit ratio. Jalenkovic [?] has 
provided a closed form expression for the particular case of generalized power-law demand 
with skewness parameter a > 1, for the asymptotic case, N,C — ► oo. The same author has 
shown that the hit ratio of LRU under such demand is asymptotically insensitive for large 
caches, i.e., C — > oo, to temporal correlations of the request arrival process [?]. The most 
recent attempts on the analysis of LRU can be found in [?, ?, ?]. These works build on the 
notion of characteristic time, which is also used in our work. More details on these works and 
the concept of the characteristic time are given in the following sections. 

3 Background and Scope 

Consider an object set O = {o±, . . . , ojv}, where Oj denotes the ith unit-sized object. Assume 

that requests are issued for the objects of O and that successive requests are independent and 

identically distributed according to a common probability distribution p = {p\, . . . ,pjsr}, where 

Pi denotes the request probability for the ith most popular object of O (hereafter assumed to be 

object Oi without loss of generality). The aggregate stream of requests is assumed to be arriving 

to a cache according to a Poisson arrival procesjfl of rate A requests/unit of time (meaning 

that the stream of request for any given object is also Poisson with rate A • pi, 1 < i < N). In 

Laoutaris et al. [?] we showed that under the above mentioned request model, an LRU operated 

cache with capacity for C unit-sized objects reaches a steady-state in which the probability of 

2 The independent reference model [?] is commonly used to characterize cache access patterns [?, ?]. The 
impact of temporal correlations was shown in [?, ?] to be minuscule, especially under typical, Zipf-like object 
popularity profiles. These works showed that temporal correlations decrease rapidly with the distance between 
any two samples so, as long as the cache size is not minuscule, they do not impact fundamentally on the i.i.d. 
assumption. The unit assumption regarding the size of objects is a standard one in all previous works [?]- 
[?] and stems from the desire to avoid adding 0/1-knapsack type complexities to a problem that is already 
combinatorial. Practically, it is justified on the basis that in many caching systems the objects are much 
smaller than the available cache size. Similarly, all previous works assume stationarity of demand over some 
time horizon. This is supported by many of the aforementioned measurement works, over multiple time scales. 
Obviously, if the demand is non stationary and radically changing over small time scales, no analysis can be 
carried out. 

3 We can alternatively obtain similar results by assuming a Bernoulli arrival process and carrying-out a 
discrete time analysis. We choose to remain on the continuous time domain so as to be aligned with the 
preceding body of work in [?, ?, ?]. 
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finding object in the cache is given by: 



m = 1 - e~ p ^ (1) 

In the above equation, r« denotes the maximum inter-arrival time between two adjacent request 
for object Oj, both of which lead to hits. This quantity is referred to as the characteristic time 
of object Oi and is due to Che et al. [?]. In essence, r% is a random variable, but it can be 
approximated by a constant in order to carry-out a tractable analysis. This is characterized 
as a mean field approximation in [?] and the rationale behind it is that when the object set is 
large enough, rj fluctuates closely around its mean value, so it can be effectively approximated 
by it. The characteristic time of object o,, 1 < i < N, was obtained in [?, ?] by solving the 
following equation numerically: 

N N 

1 - e -» r< = C => e ~ Pjn =N-1-C (2) 

3=1 3=1 
j^i 3^-i 

This equation gives the time interval that is required for the N — 1 other objects to generate 
C distinct requests^ and thus evict Oj, granted that o\ is not re-requested in this interval. 
However, solving N such equations, one for each object, is cumbersome, especially for large N. 
This can be partially alleviated by considering a single characteristic time r for all the objects 
and thus solving only one equation. Such an approximation is justifiable on the basis that 
the characteristic times rj of different objects do not differ substantially, even under skewed 
popularity distributions. Figure [T] supports this claim by illustrating the characteristic times 
of objects in an LRU cache with capacity for C = 100 objects that is driven by requests over an 
object universe of N = 1000 objects, whose popularities follow a generalized power-law with 
skewness a = 0.8 (the request rate for this and all subsequent examples is normalized to A = 
1 request/unit of time). The characteristic times are obtained by solving Eq. ((2J) numerically. 
One can observe that although p is skewed, the difference between the characteristic times 
of different objects is very small (thus ri/riooo = 1.011 despite that pi/piooo = 251, i.e., two 
orders of magnitude apart). The plot essentially says that request inter-arrivals for the same 
object that are longer than 134-135 time units lead to misses. 

The approach of using a common characteristic time r for all objects was recently employed 
by Panagakis et al. in [?]. In the same work it was observed that the most natural way of 



4 Observe that the quantity within the summation is the CDF of the exponential request inter-arrival time 
for object o 3 - calculated at point r%. 
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Figure 1: An example with the characteristic times calculated by Eq. ((2]) 



finding the common characteristic time is by solving the following normalization equation 
which simply requires that all the steady-state object hit probabilities sum up to the capacity 
of the cache, i.e.: 



N 



N 



N 



= c 1 - e ~ p ' r = C ^Y1 e ~ nr = N - c 



(3) 



i=l i=l i=l 

The above equation was solved numerically in [?], similarly to the case of [?, ?] and Eq. ([2]). 
In the following section, we utilize the notion of characteristic time as developed in [?, ?, ?] 
and present an analysis that leads to the derivation of a closed-form formula for the behavior 
of LRU caching. This is, to the best of our knowledge, the first, non-asymptotic, closed-form 
approximate formula for LRU (the closed-form expression of Jalenkovic in [?] covers only the 
asymptotic case N, C —* oo and is only for a > 1). Our method can be used for the study 
of LRU caching, whether in stand-alone mode (a single LRU cache), or, more interestingly, 
in hierarchical [?, ?] or distributed [?] inter-connections of caches, without requiring laborious 
numeric computations. 



4 Analysis of LRU under Generalized Power-Law Demand 

We assume that p follows a generalized power-law distribution, in which the ith. most popular 
object has request probability pi = A/i a , where A = (X^ =1 ya)" 1 is a normalization constant, 
and a is a skewness parameter. Under such demand, we show how to obtain an approximate 
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closed- form formula for the common characteristic time r of Eq. ([3]). This gives directly a 
closed- form expression for the hit ratio of each object through Eq. (pQ). Our analysis can be 
easily adapted to handling per object characteristic times 7j. The only difference in this case 
would be that we would start from Eq. ([2]) instead of Eq. ([3]). 

First we take the Taylor series expansion of the exponential form e~ PiV in terms of the 
variable r around point C: 



-PiC . ^ (~Pi ■ ( r ~ C )) k 



k=0 



(4) 



The exponential form e PiC of Eq. (J3|) can be similarly expanded in terms of the variable pi 
around point as follows: 



-PiC 



E 

k=0 



-PiCf 



(5) 



Using Eqs © in Eq. ([3]) we can write: 



N N / oo 



-PiC) 1 

k\ 



E 



(- Pi ■ (r - C)f 
k\ 



N-C 



(6) 



i=l i=l \k=0 / \k=0 

Denoting = (—C) k /k\ and b^ = (— (r — C)) k /k\, and limiting k to < k < K instead of 
letting it run to oo, we can approximate Eq. ([6|) as follows: 



i=l \k=0 / \k=0 / 

/ / 



N 

E 

i=i 



2K 

E 

m=0 



2K 



m=0 



N-C 



y~] a mi ■ b 



m l ' m 2 ■ 

Vm ^ < A' , rn <2<K 
m-j^ -\-rti<£=m 



\ 



^ ^ Q"mi ' b m 2 



m l > m 2 ■ 



m 2 



V 



(7) 
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As will be shown later through numeric examples, the truncation to K has a small effect on 
the accuracy as compared to solving Eq. ([7|) for K — ► 00. This owes to the fact that the 
remainder for k > K of the previous exponential forms dH), ([5]) can be bounded by 0(1/K\). 

We continue the analysis by putting into use our assumption that pi follows a power-law 
distribution, and so we can write: 

N IV N 



iv / A \ m 1 

i=l i=l v ' i=l 



H 



(am) 
N J 



(8) 



where = ^2l=x V' a denotes the iVth Harmonic number of order a. H^' can be approxi- 

(see also [?]). Substituting from Eq. (JSj) into 



(a) 



mated by its integral expression Hffi ~ 
Eq. (J7J) we obtain our master equation: 

( ( 



N 1 



1-a 



2K 

E 

m=0 



E 



J m.2 



m l ; m 2 : 
< A', mo < K 
mj +m2 = m 



\ 



■A m -H 



(am) 
N 



J 



N -C 



0) 



The master equation is a i^T-order polynomial equation of r (corresponding to an approximate 
version of Eq. ([6]) that retains only K + 1 first terms from the Taylor series expansions of the 
exponential forms of Eqs ©). One can solve the master equation in arbitrary accuracy 
by increasing K. This, of course, presumes a numerical solution and, thus, does not differ 
fundamentally from the previous numerical approaches in [?, ?, ?]. Where the master equation 
is essentially different, is in that it has a form that can be utilized for setting up a closed- form 
solution. This can be accomplished by selecting appropriately small K that give rise to such 
results. Such flexibility is not provided by Eqs. ([2]), (j3]). 

Consider the case of K = 2. Substituting a m ,b m and doing some algebraic manipulation 
reduces the master equation into the following quadratic equation (K = 2 amounts to retaining 
the first three terms of the Taylor series expansions of Eqs ((4]), ©): 



ci2T + ct\r + ao = 



A 2 



where: a 2 = ~^H N 



(2a) 



A 3 C 



-H 



(3a) 
N 



+ 



A 4 C 2 



-H 



(4a) 
N 



a i 



(10) 



a — O H — .Hjy 



The characteristic time can then be taken by selecting an appropriate real solution (assuming 
that one exists, more on this in the sequel) from the quadratic formula: r = — t|: '• " ' 



2a 2 



We can go a step further and consider the case of K = 3 which yields the following cubic 
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equation: 

a 3 r + a 2 r + Ql r + a — U where. a 3 — — ^"-"tv "i q~ N 12~ N 36~ N 



A 2 (2a) A 4 C 2 (4a) A 5 C 3 (5a) A^C 4 (6a) 

t N -—r H " + — Hn ~~[2~ Hn 



'■" . A 4 C 3 (4a) A 5 C 4 (5a) .\«r'„ 
6 TV 12 ^ 12 



ai - -AF^ H — iljy T7T h n + —^T h n 



A 4 C 4 (4a) A 6 C 6 (6a) 
ao-C-— ^ "^g-^TV 

(11) 

The cubic formula [?] (we do not repeat it here due to space considerations) returns the 
three solutions to the above cubic equation expressed as analytic functions of the coefficients 
«3, a.2, ol\, ao (which, in turn, are analytic functionjf] of the input parameters C, N, a); at least 
one the three solutions is always guaranteed to be in the domain of real numbers (such a guar- 
antee does not exist for the quadratic equation, for which, both solutions can be complex). Due 
to this guarantee, and also to the fact that it provides a closer approximation by considering 
an additional term from the Taylor expansion, we focus on the K = 3 casein Let ta,tb,ty be 
the three roots of Eq. (jllh returned by the cubic formula. We select as characteristic time the 
smallest real solution rx,X E {A,B,T} that exceeds C, i.e.: 

r = min (rx) :r i £R. r x >C (12) 

X€{A,B,T} 

The rationale behind this choice is that it takes at least C requests to evict a newly inserted 
object so the characteristic time has to be larger than C (the characteristic time is in units of 
time or alternatively in number of requests, since we have normalized the request rate A into 
1 req./time slot). In the next section we show that the above approximation yields accurate r 
and 7Tj across a wide range of parameters C, N, a. 



5 Numeric Results 

In this section we first compare the accuracy of the approximate characteristic time that we 

obtain from Eq. (I12p with the exact characteristic time that we obtain from solving Eq. ([3]) 

5 For the generalized Harmonic number we use its integral approximation as stated earlier on. 

6 Theoretically we could go even further and consider the quartic equation (K — 4). This, however, involves 

very cumbersome formulas for the roots and is marginally valuable since the cubic equation already provides 

close approximation as will be demonstrated in Sect. \S\ The quintic and all higher order equations (K > 5) do 

not posses a general solution over the rationals in terms of radicals (the "Abel-Ruffini" theorem). 
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a\C 


50 


100 


150 


200 


0.4 


51.8 
52 


107.5 
107.9 


167.5 
167.8 


232.2 
232.1 


0.6 


53.6 
54 


114.3 
113.9 


181.9 
178.6 


256.7 
248.9 


0.8 


59.6 
59.1 


133.8 
128.9 


220.2 
167.5 


318.6 
225.2 



Table 1: Evaluation of the accuracy of our approximate closed- form formula for the characteristic time on a 
set of N = 1000 objects, under varying cache size C and demand skewness a. The top value of each cell gives 
the exact characteristic time from solving Eq. ((3]) numerically while the bottom value gives the approximate 
characteristic time from Eq. (|12|l . 

numerically. Table Q] provides such a comparison drawn from a universe of N = 1000 objects 
and for varying a and C. Each cell of the table corresponds to an (a, C) pair and contains 
two numeric values: the top one is the exact characteristic time while the bottom one is the 
approximate one that we compute through our method. These values correspond to units of 
time, or equivalently, number of requests. 

One may observe that our approximation tracks closely the actual characteristic time. 
Deviations appear only under very skewed demand (e.g., a > 0.8) and large relative storage 
capacities (e.g., C/N > 20%). These cases, however, are neither typical, nor really interesting, 
for the following reasons. First, cache memories rarely operate under so much storage. Typical 
values for the ratio C/N are well below 10% in most applications (this is after all the main 
reason for employing caches - lack of memory space for all the objects). Second, a high 
availability of storage, combined with a high skewness, leads to a fairly expected cache hit 
ratio that approaches 1 and, thus, there is not much practical purpose for studying such a case 
analytically. We note, however, that our method can be twicked in order to provide useful 
results for these cases also. We show how to do this later in this section. 

The next set of results compares the analytic per object steady-state hit probabilities ob- 
tained by plugging the characteristic time r of Eq. (|12p into Eq. ([I]), with corresponding hit 
probabilities obtained by simulating LRU for 10 million requests. The three graphs of Fig. [2] 
correspond to skewness a = 0.4, 0.6 and 0.8. Each graph includes 8 curves corresponding to 
results obtained from simulation and analysis under different ratios C/N = 5%, 10%, 15% and 
20%. One may observe that for low (a = 0.4) and medium (a = 0.6) skewness, the analyti- 
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Algorithm 1 ProportionalNormalization(7r: 1 x N vector, N, C scalars) 
1: for i = 1 to N do 

2: mjmass = C — Ylj=i Ti! 
3: 5 = mjmass ■ iTi/ ^2,^ =i iTj; 
4: 7Ti = min{7Ti + S, 1}; 

cally computed hit ratios match almost perfectly with the simulated ones, across all storage 
availabilities. For high skewness (a = 0.8), our results are very accurate up to a storage avail- 
ability of 10% and then start to deviate (some deviation for C/N = 15% and a larger one for 
C/N = 20%). In other words, the method becomes less accurate under very skewed demand 
and large availability of storage. The reason for this deviation is that under such settings, the 
omission of higher order terms of the Taylor series expansion of the previously mentioned ex- 
ponential forms, disrupts significantly the balance of the (normalization) Eq. ([3]), thus leading 
to 7Tj's that do not sum up to C. As we commented earlier, a storage availability higher than 
10% is not realistic under most caching applications. Nevertheless, in the following paragraph 
we will describe proportional normalization, a method for fixing this problem by reshaping the 
7Tj's and, actually, achieving a high accuracy even under high storage availability and skewed 
demand. 

Proportional normalization: In this section we describe a simple normalization method 
for fixing the missing probability mass problem that occurs under combined high C/N and 
a. This is achieved through a proportional normalization method that distributes the missing 
probability mass among the different objects in such a way that each object's hit probability 
is incremented proportionally to its hit probability as derived by our base-line closed-form 
method. In Algorithm [T] we describe the proportional normalization method. The algorithm 
takes as input the vector of hit probabilities derived from Eq. (pQ) after plugging in the analyt- 
ically computer characteristic time r and returns a normalized vector of hit probabilities that 
sum up to C. In Fig. [3] we compare the normalized hit probabilities with the corresponding 
ones from simulation under a storage availability C/N = 20% (under such availability, and 
for high skewness, there was a substantial disagreement between simulation and analysis, as 
shown in the third graph of Fig. [2]). From Fig. [3] it is clear that after the normalization there is 
almost perfect agreement between the simulation and the analytic results. Thus by combining 
our analytic method with proportional normalization, one can obtain accurate hit ratios even 
under combined high storage availability and skewed demand. 
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6 Conclusions 



In this work we have presented a closed-form approximate method for obtaining the per ob- 
ject hit ratio under LRU replacement and independent generalized power-law requests. Our 
method obtains accurate results for a wide range of parameters. It becomes less accurate 
only when combining a very high storage availability (which is not typical under most caching 
applications) with skewed demand. To accommodate this case, we describe a simple propor- 
tional normalization procedure that, when combined with our baseline closed-form method, 
corrects its accuracy. To the best of our knowledge, our method is the first one to produce 
non asymptotic closed-form results for LRU. Due to the complete lack of any kind of numeric 
computation our method can be used for the analysis of large networks of LRU caches in which 
existing numeric methods and simulation become impractical from a computational point of 
view. 
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Figure 2: Comparison of simulated and analytic per object hit probabilities (7Tj's) on a universe of N = 1000 
objects for different storage capacities (C = 50, 100, 150, 200) and skewness parameters (a = 0.4, 0.6, 0.8) for 
the input generalized power-law demand. A word of c!2tion: in the third graph (a = 0.8) the analytic line for 
C — 200 overlaps coincidentally with the simulation line for C = 150. 
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Figure 3: Comparison of simulated and proportionally normalized analytic per object hit probabilities (-7T, 
on a universe of N = 1000 objects for a storage capacity C = 200 and different skewness parameters (a 
0.4, 0.6, 0.8) for the input generalized power-law demand. 
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